Talend Big Data – Spark Streaming
SubscriptionThis content is available for Talend Academy subscription users.Instructor-ledThis content is available as instructor-led training.
This learning plan concentrates on Big Data Spark Jobs. It is mainly focused on Big Data Streaming Jobs but also introduces you to Big Data Batch Jobs.
After an introduction to Apache Kafka and Apache Spark, you work on a log processing use case, and a common big data use case. You see how to publish messages to Kafka, subscribe to receive messages, insert data into Elasticsearch, and use Kibana to create charts and dashboards. You also see how to save data to and read data from HBase tables.
Duration: 1 day (7 hours)
Target audience: Anyone who wants to use Talend Studio to interact with big data systems
Prerequisites: Completion of Talend Big Data Basics
Badge: Complete this learning plan to earn the Talend Big Data Developer Practitioner badge. To know more about the criteria to earn this badge, refer to the Talend Academy Badging Program page.
Learning objectives: After completing this learning plan, you will be able to:
-
Connect to a Hadoop cluster from a Talend Job
-
Use context variables and metadata
-
Read and write files in HDFS or HBase in a Big Data batch or Big Data Streaming Job
-
Read and write messages in a Kafka topic in real time
-
Configure a Big Data Batch Job to use the Spark framework
-
Configure a Big Data Streaming Job to use the Spark streaming framework
-
Save logs to Elasticsearch
-
Configure a Kibana dashboard
-
Ingest a stream of data to a NoSQL database, HBase
Training modules: To complete the learning plan, take the following training modules: